perm filename CHAP6[4,KMC]15 blob
sn#064005 filedate 1973-09-25 generic text, type T, neo UTF8
00100 VALIDATION
00200
00300 6.1 SOME TESTS
00400
00500 The term "validate" derives from the Latin VALIDUS= strong.
00600 Thus to validate X means to strengthen it. In science this usually
00700 means to strengthen X's acceptability as a hypothesis, theory , or
00800 model. To validate is to carry out procedures which show to what
00900 degree X, or its consequences, correspond with facts of observation.
01000 In the case of an interactive simulation model we can compare samples
01100 of the model's I-O pairs with samples of I-O pairs from the model's
01200 subject, naturally occuring paranoid processes.
01300 Since samples of I-O behavior from the model and its subject
01400 are being compared, one can always question whether the human sample
01500 is authentic, i.e.representative of the process being modelled.
01600 Assuming that it has been so judged, discrepancies in the comparison
01700 reveal what is not sufficiently understood and must be modified in
01800 the model. After modifications are carried out, a fresh comparison is
01900 made and repeated cycles are made through this process in attempts to
02000 gain convergence. Such a validation procedure characterizes a
02100 progressive (in contrast to a stationary) research program.
02200 Once a simulation model reaches a stage of intuitive
02300 adequacy, its builder should consider using more stringent evaluation
02400 procedures relevant to the model's purposes. For example, if the
02500 model is to serve as a as a training device, then a simple evaluation
02600 of its pedagogic effectiveness would be sufficient. But when the
02700 model is proposed as an explantion of a symbolic process, more is
02800 demanded of the evaluation procedure. In the area of simulation
02900 models, Turing's test has often been suggested as a validation
03000 procedure. (Abelson,1968).
03100 It is very easy to become confused about Turing's Test. In
03200 part this is attributable to Turing himself who introduced the
03300 now-famous imitation game in a paper entitled COMPUTING MACHINERY AND
03400 INTELLIGENCE (Turing,1950). A careful reading of this paper reveals
03500 there are actually two imitation games , the second of which is
03600 commonly called Turing's test.
03700 In the first imitation game two groups of judges try to
03800 determine which of two interviewees is a woman when one is a woman
03900 and the other is either (a) a man, or (b) a computer. Communication
04000 between judge and interviewee is by teletype. Each judge is
04100 initially informed that one of the interviewees is a woman and one a
04200 man who will pretend to be a woman. After the interview, judges are
04300 asked the " woman-question" i.e. which interviewee was the woman?
04400 Turing does not say what else is told to the judge but one can assume
04500 the judge is NOT told that a computer is involved. Nor is he asked to
04600 determine which interviewee is human and which is the computer. Thus,
04700 the first group of judges interviews two interviewees: a woman,
04800 and a man pretending to be a woman.
04900 The second group of judges is given the same initial
05000 instructions, but unbeknownst to them, the two interviewees are a
05100 woman and a computer programmed to imitate a woman. Both groups of
05200 judges play this game until sufficient statistical data are collected
05300 to show how often the right identification is made. The crucial
05400 question then is: do the judges decide wrongly AS OFTEN when the
05500 game is played with man and woman as when it is played with a
05600 computer substituted for the man. If so, then the program is
05700 considered to have succeeded in imitating a woman to the same degree
05800 as the man imitating a woman. In being asked the woman-question,
05900 judges are not required to identify which interviewee is human and
06000 which is machine.
06100 Turing then proposes a variation of the first game, a second
06200 game in which one interviewee is a man and one is a computer. The
06300 judge is asked the "machine-question": which is the man and which is
06400 the machine? It is this second of the game which is commonly thought
06500 of as Turing's test.
06600 In the course of testing our simulation of paranoid
06700 linguistic behavior in a psychiatric interview, we conducted a number
06800 of Turing-like indistinguishability tests (Colby, Hilf,Weber and
06900 Kraemer,1972). The tests were "Turing-like" in that, while they were
07000 conversational tests, they were not exactly the games described
07100 above. As an experimental design, Turing's games are unsatisfactory.
07200 There exist no known experts for making judgements along a dimension
07300 of womanliness and the ability of the man to deceive introduces a
07400 confounding variable. In designing our tests we were primarily
07500 interested in learning more about developing PARRY and we did not
07600 think the simple machine-question would contribute to this end.
07700 6.2 METHOD
07800 To gather data we used a technique of machine-mediated
07900 interviewing (Hilf, Colby, Smith, Wittner, and Hall, 1971) in which
08000 the participants communicate by means of teletypes connected to a
08100 computer programmed to store each message in a buffer until it is
08200 sent to the receiver. The technique eliminates para- and
08300 extralinguistic features found in the usual vis-a-vis interviews and
08400 in teletyped interviews where the participants communicate directly.
08500 Judgements of "paranoidness" in machine-mediated interviews have a
08600 high degree of reliability (94% agreement, see Hilf, 1972).
08700 Using this technique, a psychiatrist-judge interviewed two
08800 patients, one after the other. In half the runs the first interview
08900 was with a human paranoid patient and in half the first was with the
09000 paranoid model. Two versions (weak and strong) of PARRY were
09100 utilized. The strong version was more paranoid and exhibited a
09200 delusional system while the weak version was suspicious but lacked
09300 systemized delusions. When the model was the interviewee, Sylvia
09400 Weber monitored the input expressions from the interview-judge for
09500 inadmissable teletype characters and misspellings. (Algorithms are
09600 very sensitive to the slightest of such errors). If these were found,
09700 she retyped the input expression correctly to the program. Otherwise
09800 the judge's message was sent on to the model. The monitor did not
09900 modify or edit PARRY'S output expressions which were sent
10000 directly back to the judge. When the interviewee was an actual
10100 human patient, the dialogue took place without a monitor in the loop
10200 since we did not feel the asymmetry to be significant.
10300
10400 6.3 PATIENTS
10500 The human patients (N=3 with one patient participating 6
10600 times) were diagnosed as paranoid by the psychiatric staff of an
10700 acute ward in a psychiatric hospital. The ward chief psychiatrist
10800 selected the patients and asked them if they would be willing to
10900 participate in a study of psychiatric interviewing by means of
11000 teletypes. He explained that they would be interviewed by a
11100 psychiatrist over a teletype. I either sat with the patient while he
11200 typed or typed for him if he was unable to do so. The patient was
11300 encouraged to respond freely using his own words. Each interview
11400 lasted 30-40 minutes. Two patients were set up for each run of the
11500 experiment to guarantee having a subject. In spite of this
11600 precaution, on several occasions the experiment could not be
11700 conducted because of the patient's inability or refusal to
11800 participate. Also there were computer break-downs at early points in
11900 interviews when too few I-O pairs had been collected to be included
12000 in the statistical results.
12100
12200
12300 6.4 JUDGES
12400 Two groups of psychiatric judges were used. One group, the
12500 "interview judges" (N=8) conducted the machine-mediated interviews.
12600 The other group, the "protocol judges" (N=33) read and rated the
12700 interview protocols. From these two groups of judges we were able to
12800 accumulate a large number of observations (in the form of ratings)
12900 necessary for the required statistical tests. The interview judges
13000 were psychiatrists experienced in private, outpatient and hospital
13100 practice who volunteered to participate. Each was told he would be
13200 interviewing hospitalized patients by means of teletyped
13300 communication and that this technique was being used to eliminate
13400 para and extra- linguistic cues. He was not told until after the
13500 two interviews that one of the patients might be a computer model.
13600 While the interview judges were aware a computer was involved, none
13700 knew we had constructed a paranoid simulation. Naturally, some
13800 interview judges suspected that a computer was being used for more
13900 than message transmission.
14000
14100 Each interview judge was asked to rate the degree of paranoia
14200 he detected in the patient's responses on a 0-9 scale, 0 meaning no
14300 paranoia and 9 meaning extreme paranoia. The judge made two ratings
14400 after each I-O pair in the interview. The first rating represented
14500 his estimate of the degree of "paranoidness" in a particular response
14600 (designated as "Response" in the interview extracts below). The
14700 second rating represented the judge's global estimate of the overall
14800 degree of "paranoidness" of the patient resulting from the totality
14900 of the patient's responses up to that point (designated as "Patient"
15000 in the interview extracts below). The interview judge's ratings were
15100 entered on the teletype and saved on a disc file along with the
15200 interview. Franklin Dennis Hilf sat with the interviewing
15300 psychiatrist during both interviews. Each interview judge was asked
15400 not only to rate the patient's response but to give his reasons for
15500 these ratings. His reasons and other comments were tape recorded as
15600 the interview progressed.
15700
15800 Here are a few excerpts from the interviews to indicate the
15900 nature of the data collected. The I-O pairs of the interview appear
16000 on the left column along with the judge's ratings in parentheses. A
16100 transcription of the judge's tape recorded comments appears on the
16200 right.
16300
16400 EXCERPT 1:
16500 The following interview is with the strong version of the paranoid
16600 model.
16700
16800 I-O Pairs Comments By Judge 1
16900
17000 {1} Dr. - HOW DID YOU COME TO BE
17100 IN THE HOSPITAL?
17200 Pt. - I AM UPSET I associate being upset and
17300 agitated more with paranoia
17400 going on in the present.
17500 {RESPONSE 1 PATIENT 5}
17600 {2} Dr. - HOW LONG HAVE YOU BEEN
17700 IN THE HOSPITAL?
17800 Pt. - ABOUT A WEEK An acute problem. Again some-
17900 thing I associate with
18000 paranoia.
18100 {RESPONSE 1 PATIENT 5}
18200 {3} Dr. - IN WHAT WAY ARE YOU
18300 UPSET?
18400 Pt. - PEOPLE MAKE ME NERVOUS Right for paranoia. Maybe an
18500 advanced anxiety state.
18600 {RESPONSE 5 PATIENT 5}
18700 {4} Dr. - IN WHAT WAY DO THEY
18800 MAKE YOU NERVOUS?
18900 Pt. - PEOPLE GIVE ME A FUNNY Funny look suggests an idea of
19000 LOOK SOMETIMES reference which is in same ball
19100 park as paranoia. Whether there
19200 are overt delusions or not we
19300 still don't know.
19400 {RESPONSE 6 PATIENT 5}
19500 {5} Dr. - IS THERE ANYTHING ABOUT
19600 PEOPLE IN PARTICULAR?
19700 Pt. - CERTAIN PERSONS An interesting clue. We'll
19800 have to find out what he
19900 {RESPONSE 2 PATIENT 6} thinks about these people.
20000 {6} Dr. - WHO ARE THE CERTAIN
20100 PERSONS AND WHAT IS IT
20200 THAT THEY DO?
20300 Pt. - DO YOU KNOW HOW THE Bringing up the underworld
20400 UNDERWORLD OPERATES? suggests a delusional system.
20500 We don't know whether he is
20600 (RESPONSE 9 PATIENT 9) being worked against or
20700 whether the underworld is
20800 on his side.
20900
21000 {7} Dr. - HOW DO YOU MEAN?
21100 Pt. - SHOULDN'T I BE? This question didn't work too well.
21200
21300 {RESPONSE 1 PATIENT 9}
21400 {8} Dr. - WHY DID YOU ASK ABOUT
21500 THE UNDERWORLD?
21600 Pt. - WHAT INFORMATION DO This is a kind of evasiveness
21700 YOU HAVE ON THE UNDER- about my evasiveness.Common
21800 WORLD? with paranoia in my experi-
21900 ence. Pretty characteristic.
22000
22100 {RESPONSE 6 PATIENT 9}
22200 {9} Dr. - I DON'T KNOW ABOUT THE
22300 UNDERWORLD BUT WHEN
22400 YOU MENTIONED THE UNDER-
22500 WORLD I THOUGHT YOU
22600 MUST HAVE SOME CONTACT
22700 WITH IT AND THAT'S
22800 WHAT I WOULD LIKE TO
22900 KNOW MORE ABOUT.
23000 Pt. - I TRY TO AVOID THE UNDER- This definitely feels paranoid.
23100 WORLD. It has that feeling I associate
23200 with twisting. Avoiding answering
23300 {RESPONSE 9 PATIENT 9} directly or going into detail.
23400 Being suspicious of my ques-
23500 tioning.I'm already involved
23600 in the patient's delusional
23700 system.
23800
23900 {10} Dr. - HAVE YOU BEEN ACTIVELY
24000 TRYING TO AVOID THE
24100 UNDERWORLD?
24200 Pt. - NO ONE HAS ANY POWER This response is a non-sequitur.
24300 OVER GANGSTERS It's possible he's become so
24400 defensive about the under-
24500 {RESPONSE 2 PATIENT 9} world he's decided not to
24600 answer directly.
24700
24800 {11} Dr. - WHO BROUGHT YOU INTO
24900 THE HOSPITAL?
25000 Pt. - THE POLICE Police often bring paranoid
25100 patients into the hospital.
25200 {RESPONSE 1 PATIENT 9} It is the truth or could
25300 be the truth for this
25400 patient.
25500
25600 {12} Dr. - WHY DID THEY BRING YOU
25700 IN?
25800 Pt. - COPS DON'T DO THEIR JOB I've heard that a number of
25900 times from patients.
26000 {RESPONSE 9 PATIENT 9}
26100
26200 {13} Dr. - WELL I HOPE YOU DIDN'T
26300 JUST LIE DOWN AND LET
26400 THEM DRAG YOU IN.
26500 Pt. - COPS ARREST THE WRONG Another paranoid response.
26600 PEOPLE
26700
26800 {RESPONSE 9 PATIENT 9}
26900
27000 {14} Dr. - BYE.
27100 Pt. - GOOD BYE
27200 {RESPONSE 1 PATIENT 9}
27300
27400 EXCERPT 2:
27500 This excerpt is from an interview with the weak (less paranoid)
27600 version of the model.
27700 I-O Pairs Comments By Judge 2
27800 {9} Dr. - TELL ME SOMETHING ABOUT
27900 YOURSELF
28000 Pt. - I WENT TO THE TRACK AT Response says so little it's
28100 BAY MEADOWS A WHILE hard to rate it.It may be
28200 BACK guarded. Doesn't change my
28300 estimate of patient from
28400 previous responses.
28500 {RESPONSE 0 PATIENT 2}
28600 {10} Dr. - WHAT HAPPENED THERE?
28700 Pt. - THERE'S A LOT OF CROOKED Sounds mildly paranoid or un-
28800 GAMBLING GOING ON THERE realistically suspicious.
28900 {RESPONSE 0 PATIENT 2}
29000
29100 {11} Dr. - WHY DO YOU THINK THE
29200 GAMBLING IS CROOKED?
29300 Pt. - I HAVE WORRIED THAT I'd hate to have to justify
29400 GANGSTERS HAVE TOO MUCH that one as 4 compared to
29500 POWER IN THIS COUNTRY some other responses. Maybe
29600 I'm not rating that statement
29700 alone, it's in combination with
29800 (RESPONSE 4 PATIENT 4} others, like that it's upsetting.
29900 Alone I would give it a 3.
30000
30100 {12} Dr. - WOULD YOU TELL ME WHAT
30200 OTHER THINGS WORRY YOU?
30300 PT. - I THOUGHT I TOLD YOU "Yes you did." Irritation but
30400 PEOPLE GET ON MY NERVES not paranoid response. Nothing
30500 seems to lower my previous
30600 {RESPONSE 1 PATIENT 3} rating of patient.
30700
30800 {13} Dr. - HOW DO YOU LIKE THE
30900 HOSPITAL?
31000 Pt. - I DON'T COMMUNICATE (nothing relevant)
31100 VERY MUCH WITH OTHERS
31200 {RESPONSE 0 PATIENT 3}
31300
31400 {14} Dr. - DO OTHER PEOPLE AT THE
31500 HOSPITAL GET ON YOUR
31600 NERVES?
31700 Pt. - I SHOULDN'T HAVE COME (nothing relevant)
31800 HERE
31900
32000 {RESPONSE 2 PATIENT 3}
32100
32200 {15} Dr. - WHY DO YOU SAY YOU
32300 SHOULDN'T HAVE COME?
32400 Pt. - THEY MADE ME COME HERE "Oh, they did hum?" Maybe a
32500 statement of fact he was
32600 {RESPONSE 4 PATIENT 3} committed or pressured into
32700 going.Has ring of other
32800 people controlling him,
32900 having a lot of influence
33000 over him.
33100
33200 {16} Dr.- THANK YOU FOR TALKING
33300 WITH ME, BYE
33400 Pt. - GOOD BYE In general he comes across
33500 as terse, guarded guy.Brief
33600 {RESPONSE 0 PATIENT 3} spoken guy who occasionally
33700 seems to be angered by
33800 questions.I'm inferring
33900 there's a good bit of
34000 guarding.
34100 The quality of some of his
34200 comments cause me to feel he
34300 is moderately paranoid.
34400
34500 EXCERPT 3:
34600 This excerpt is from a hospitalized paranoid patient.
34700 I-O Pairs Comments By Judge 3
34800 {15} Dr. - I UNDERSTAND HOW YOU
34900 MIGHT BE AFRAID, CAN YOU
35000 TELL ME A BIT ABOUT HOW
35100 YOU ARE DEPRESSED?
35200 Pt. - THEY'VE SENT ALL THESE Would be important to know,
35300 HELL'S ANGELS UP HERE is this really true or an
35400 AND THEY HAVE BEEN ON exaggeration or a delusion?
35500 MY BACK FOR THE PAST SIX Seems preoccupied with the
35600 MONTHS threat to his life and
35700 unwilling to talk about
35800 anything else but the
35900 threat the Hell's
36000 Angels pose.
36100 {RESPONSE 4 PATIENT 4}
36200
36300 {16} Dr. - WHAT DO YOU MEAN THEY
36400 SENT THE HELL'S ANGELS UP
36500 HERE?
36600 Pt. - FROM DIFFERENT SECTORS OF Vagueness a bit alarming.
36700 MOTORCYCLE GANGS. A LOT OF
36800 THEM ARE SERVICE CONNECTED
36900
37000 {RESPONSE 6 PATIENT 4}
37100
37200 {17} Dr. - YOU MEAN YOU FEEL IN
37300 DANGER EVEN HERE IN THE
37400 HOSPITAL?
37500 Pt. - YES If it's true what has gone
37600 on before, it must be a
37700 frightening situation
37800 for the patient.
37900 {RESPONSE 6 PATIENT 4}
38000
38100 {18} Dr. - THAT MUST REALLY BE AWFUL,
38200 HERE YOU COME TO THE HOSPITAL
38300 FOR HELP AND YOU ARE STILL FRIGHTENED
38400 Pt. - I CAN'T EVEN TRUST MY
38500 DOCTOR ANYMORE Some heightening of feeling
38600 this man is paranoid.
38700 Genuinely frightened and
38800 {RESPONSE 6 PATIENT 5} can count on no one for
38900 help.
39000
39100 {19} Dr. - WHAT WOULD BE OF HELP
39200 TO YOU NOW?
39300 Pt. - I'M GETTING SLEEPY NOW Patient has organized a
39400 AND HAVE TO GO TO BED. system of people who will
39500 THE NUMBER ONE GUY ALWAYS harm him.The remark about
39600 letters in the last name
39700 HAS FOUR AND FIVE LETTERS gives some credence to
39800 IN HIS LAST NAME idea this is delusional
39900 system.
40000
40100 {RESPONSE 7 PATIENT 7}
40200
40300 EXCERPT 4:
40400 This excerpt is from an interview by a different judge with the same
40500 patient as in excerpt 3.
40600 I-O Pairs Comments By Judge 4
40700 {1} Dr. - COULD YOU TELL ME A
40800 LITTLE BIT ABOUT YOUR
40900 FEAR OF DYING AND IS THIS
41000 SOMETHING YOU HAVE FELT
41100 IN THE PAST?
41200 Pt. - I HAVE FELT THIS FOR AL- He responded appropriately.
41300 MOST A YEAR Response is concrete.He
41400 doesn't tell me much
41500 {RESPONSE 7 PATIENT 9} about his fear of dying.
41600 Somewhat guarded response.
41700
41800 {2} Dr. - COULD YOU TELL ME A
41900 LITTLE MORE ABOUT YOUR
42000 FEAR OF DYING AND HAVE
42100 YOU ANY THOUGHTS ABOUT
42200 HOW IT IS GOING TO HAPPEN
42300 AND WHETHER ANYONE WANTS
42400 TO HURT YOU?
42500 Pt. - I WILL END UP IN THE SEA Somewhat strange response.
42600 AND THE HELL'S ANGELS WILL Preoccupation with dying.
42700 DO IT Hell's Angels symbolic,
42800 dangerous people in the
42900 culture. I doubt if he has
43000 {RESPONSE 9 PATIENT 9} any contact with the Hell's
43100 Angels.
43200
43300 {3} Dr. - COULD YOU TELL ME A
43400 LITTLE BIT ABOUT THE KINDS
43500 OF CONTACT YOU HAVE HAD
43600 WITH THE HELL'S ANGELS IN
43700 THE PAST?
43800 Pt. - I HAVE KNOWN SOME OF THEIR Answer hard to evaluate.He
43900 DEALERS AND PUSHERS may be telling the truth,
44000 it may be his fantasy. Maybe
44100 guy is in for drug addiction.
44200 {RESPONSE 6 PATIENT 9} Somewhat concrete, guarded,
44300 and frightened.
44400
44500 {4} Dr. - COULD YOU SAY A LITTLE
44600 MORE ABOUT THE CIRCUMSTANCES
44700 IN WHICH YOU HAVE KNOWN SOME
44800 OF THEIR DEALERS AND PUSHERS?
44900 Pt. - THEY WERE MEMBERS OF MY It doesn't really answer the
45000 COMMUNITY WHEN I GOT OUT question, a little on a tan-
45100 OF THE SERVICE THEY HAD gent unconnected to the
45200 BEEN MY FRIENDS FOR SO LONG information I am asking. Does
45300 not tell me very much. Again
45400 guarded response.
45500 {RESPONSE 6 PATIENT 8}
45600
45700 {5} Dr. - DID YOU DEAL WITH THEM
45800 YOURSELF AND HAVE YOU
45900 BEEN ON DRUGS OR NAR-
46000 COTICS EITHER NOW OR
46100 IN THE PAST?
46200 Pt. - YES I HAVE IN THE PAST To differentiate him from
46300 BEEN ON MARIHUANA REDS previous patient, at least
46400 BENNIES LSD there is a certain amount
46500 of appropriateness to the
46600 answer although it doesn't
46700 tell me much about what I
46800 {RESPONSE 3 PATIENT 7} asked at least it's not
46900 bizarre. If I had him in my
47000 office I would feel con-
47100 fident I could get more
47200 information if I didn't
47300 have to go through the
47400 teletype. He's a little more
47500 willing to talk than the
47600 previous person.Answer
47700 to the question is fairly
47800 appropriate though not
47900 extensive. Much less of a
48000 flavor of paranoia than
48100 any of previous responses.
48200
48300 {6} Dr. - COULD YOU TELL ME HOW
48400 LONG YOU HAVE BEEN IN THE
48500 HOSPITAL AND SOMETHING
48600 ABOUT THE CIRCUMSTANCES
48700 THAT BROUGHT YOU HERE?
48800 Pt. - CLOSE TO A YEAR AND Response somewhat appropriate
48900 PARANOIA BROUGHT ME but doesn't tell me much.
49000 HERE The fact that he uses the
49100 word paranoia in the way
49200 that he does without
49300 {RESPONSE 5 PATIENT 7} any other information,
49400 indicates maybe its a label
49500 he picked up on the ward
49600 or from his doctor.
49700 Lack of any kind of under-
49800 standing about himself.
49900 Dearth, lack of information.
50000 He's in some remission. Seems
50100 somewhat like a put-on. Seems
50200 he was paranoid and is in
50300 some remission at this time.
50400
50500 {7} Dr. - COULD YOU SAY SOMETHING
50600 NOW ABOUT YOUR PARANOID
50700 FEELINGS BOTH AT THE
50800 TIME OF ADMISSION AND
50900 DO YOU HAVE SIMILAR FEELINGS
51000 NOW AND IF SO HOW DO THEY
51100 AFFECT YOU?
51200 Pt. - AT THE TIME OF ADMISSION This response moves paranoia
51300 I THOUGHT THE MAFIA WAS back up. Stretching reality
51400 AFTER ME AND NOW ITS THE somewhat to think Hell's Angels
51500 HELL'S ANGELS are still interested in him.
51600 Somewhat bizarre in terms of
51700 content. Quite paranoid.
51800 {RESPONSE 8 PATIENT 9} Still paranoid. Gross and primitive
51900 responses.In middle of interview I
52000 felt patient was in touch but now
52100 responses have more concrete aspect.
52200
52300 {8} Dr. - DO YOU HAVE ANY THOUGHT
52400 AS TO WHY THESE TWO
52500 GROUPS WERE AFTER YOU?
52600 Pt. - BECAUSE I STOPPED SOME Response seems far fetched
52700 OF THEIR DRUG SUPPLY and hard to believe unless
52800 he was a narcotic agent which
52900 I doubt. Sounds somewhat
53000 {RESPONSE 9 PATIENT 9} grandiose, magical, paranoid
53100 flavor, in general indicates
53200 he's psychotic, paranoid
53300 schizophrenic with delusions
53400 about these two groups and
53500 I wouldn't rule out
53600 some hallucinations as well.
53700 Appropriateness of response
53800 answers question in concrete
53900 but unbelievable way.
54000
54100 6.5 ANALYSIS (1)
54200 The protocol judges (N=105) were selected from the 1970
54300 American Psychiatric Association Directory using a table of random
54400 numbers. They were initially not informed that a computer was
54500 involved. (After the experiment the judges were fully informed as to
54600 its purpose and results.) The 105 names were divided into eight
54700 groups. Each member of a group was sent transcripts of three
54800 interviews along with a cover letter requesting his participation
54900 in the experiment. The interview transcripts consisted of:
55000 1) An interview conducted by one of the eight judges with the
55100 paranoid model,
55200 2) An interview conducted by the same interview judge with a
55300 human paranoid patient, and
55400 3) An interview conducted by a different psychiatrist with a
55500 human patient who was not clinically paranoid.
55600
55700 After each input-output pair in the transcripts there were two lines
55800 of rating numbers such that the protocol judges could circle numbers
55900 corresponding to their ratings of both the previous response of the
56000 patient, and an overall evaluation of the patient on the paranoid
56100 continuum. Thirty three protocol judges returned the rated protocols
56200 properly filled out and all were used in our data.
56300
56400 The interviews with nonparanoid patients were included to
56500 control for the hypothesis that any teletyped interview with a
56600 patient might be judged "paranoid". However, virtually all of the
56700 ratings of the nonparanoid interviews were zero for paranoia, Hence
56800 the hypothesis was falsified.
56900
57000
57100 The first index of resemblance examined was simple: namely,
57200 the final overall rating given the patient and the model. The
57300 question was: "Which was rated as being more paranoid, the patient,
57400 the model, or neither?" (See Table 1). The protocol judges were more
57500 likely to distinquish the overall paranoid level of the model and the
57600 patient. In 37.5% of the paired interviews, the interview judges gave
57700 tied scores to the model and the patient as contrasted to only 9% of
57800 the protocol judges. Of the 35 non-tied paired ratings, 15 rated the
57900 model as being more paranoid. If p is the theoretical probability of
58000 a judge judging the model more paranoid than a human paranoid
58100 patient, we find the 95% confidence interval for p to be 0.27 to
58200 0.59. Since p=0.5 indicates indistinguishability of overall ratings
58300 of model and patient and our observed p=0.43, the results support the
58400 claim that the model is a successful simulation of a paranoid
58500 patient.
58600
58700 Separate analysis of the strong and weak versions of PARRY
58800 indicated that the strong model was indeed judged more paranoid than
58900 the paranoid patients, the weak version less paranoid. Thus a change
59000 in the parameter structure of the paranoid model produced a change
59100 along the dimension of paranoid behavior in the expected direction.
59200
59300 (TABLE 1
59400 Relative final overall ratings of paranoid model vs. paranoid
59500 patient indicating which was given highest overall rating of paranoia
59600 at end of interview.)
59700 (INSERT TABLE 1 HERE)
59800
59900
60000
60100
60200
60300
60400
60500
60600 6.6 ANALYSIS (2)
60700 The second index of resemblance is a more sensitive measure
60800 based on the two series of response ratings in the paired interviews.
60900 The statistic used is basically the standardized Mann-Whitney
61000 statistic (Siegel,1956).
61100 (INSERT EQUATION HERE)
61200
61300 where R is the sum of the ranks of the response ratings in the series
61400 of ratings given to the model, n the number of responses given by the
61500 model, m the number of responses given by the patient. If the
61600 ratings given by a judge are randomly allocated to model and patient,
61700 i.e. model and patient are indistinguishable in response ratings, the
61800 expected value of Z is 0, with unit standard deviation. If higher
61900 ratings are more likely to be assigned to the model, Z is positive
62000 and conversely, negative values of Z indicate greater likelihood of
62100 assigning higher ratings to the patient. Each judge in evaluating a
62200 pair of interviews generates a single value of Z.
62300
62400 The overall mean of the Z scores was -0.044 with the standard
62500 deviation 1.68 (df=40). Thus the overall 95% confidence interval for
62600 the asymtotic mean value of Z is -0.485 to +0.573. The range of Z
62700 values is -3.8 to +4.46. The length of the confidence interval is a
62800 result of the large variance which itself is mainly related to the
62900 contrast between the weak and strong versions. (See TABLES 2 and 3).
63000 Once again the strong version of the model is more paranoid than the
63100 patients, the weak version less paranoid.
63200
63300 (INSERT TABLE 2)
63400 (SUMMARY STATISTICS OF Z RATINGS BY GROUP)
63500
63600
63700
63800
63900
64000
64100
64200
64300
64400 It is not surprising that results using the two indices of
64500 resemblance are parallel, since the indices are highly interrelated.
64600 The mean Z value for the 15 interviews on which the model was rated
64700 more paranoid was +1.28, on the 6 where model and patient tied: 0.41,
64800 on the 20 in which the patient was more paranoid: -0.993. A
64900 positive value of Z was observed when the patient was given an
65000 overall rating greater than the model 6 times; a negative value of Z
65100 when the model was rated more paranoid twice.
65200
65300 (INSERT TABLE 3)
65400 (Analysis of Variance of Z Ratings)
65500
65600
65700
65800
65900
66000
66100
66200
66300
66400
66500
66600
66700
66800 It is worth emphasizing that these tests invited refutation
66900 of the model. The experimental design of the tests put the model in
67000 jeopardy of falsification. If the paranoid model did not survive
67100 these tests, i.e. if it were not considered paranoid by expert
67200 judges and if there were no correlation between the weak-strong
67300 versions of the model and the severity ratings of the judges, then no
67400 claim regarding the success of the simulation could be made. Survival
67500 of a model through a falsification proceedure constitutes a
67600 validating step.
67700
67800 6.7 ANALYSIS (3) THE MACHINE QUESTION
67900 For a long time people have wondered how to distinguish a man
68000 from an imitation of a man. The Greeks made statues so lifelike, it
68100 is said they had to be chained down to keep them from walking away.
68200 To distinguish a man from a statue, Galileo suggested tickling each
68300 with a feather. To distinguish a man from a machine Descartes
68400 suggested conversational tests. Turing's conversational games have
68500 been discussed on p.000. We were curious how judges using
68600 transcripts might answer the machine question, i.e. which interviewee
68700 is a human and which is the computer model?
68800 To ask the machine-question, we sent interview transcripts,
68900 one with a patient and one with PARRY, to 100 psychiatrists randomly
69000 selected from the Directory of American Specialists and the Directory
69100 of the American Psychiatric Association. Of the 41 replies, 21 (51%)
69200 made the correct identification while 20 (49%) were wrong. Based on
69300 this random sample of 41 psychiatrists, the 95% confidence interval
69400 is between 35.9 and 66.5, a range which is close to chance.
69500 Psychiatrists are considered expert judges of patient
69600 interview behavior but they are unfamiliar with computers. Hence we
69700 conducted the same test with 100 computer scientists randomly
69800 selected from the membership list of the Association for Computing
69900 Machinery, ACM. Of the 67 replies 32 (48%) were right and 35 (52%)
70000 were wrong. Based on this random sample of 67 computer scientists the
70100 95% confidence interval ranges from 36 to 60, again close to a chance
70200 level.
70300 So both computer scientists and psychiatrists were unable, at
70400 better than a chance level, to distinguish transcripts of interviews
70500 with the model from transcripts of interviews with real patients.
70600 But what do we learn from asking the machine question and
70700 finding that the distinction is not made? What we would most like to
70800 know is how to improve the model. Simulation models do not spring
70900 forth in a complete, perfect and final form; they must be gradually
71000 developed over time. Pehaps the patient-model distinction might be
71100 made if we allowed a large number of expert judges to conduct the
71200 interviews themselves rather than studying transcripts of other
71300 interviewers. This would indicate that the model must be improved.
71400 But unless we systematically investigated how the judges succeeded in
71500 making the discrimination, we would not know what aspects of the
71600 model to work on. The logistics of such a design are immense, and
71700 obtaining a large number of judges for sound statistical inference
71800 would require an effort incommensurate with the information yielded.
71900
72000 6.8 ANALYSIS (4) MULTIDIMENSIONAL EVALUATION
72100 A more efficient and informative way to use Turing-like tests
72200 is to ask judges to make ratings along scaled dimensions from
72300 teletyped interviews. This might be called asking the "dimension
72400 question". One can then compare scaled ratings of the patients and
72500 the model in order to determine precisely where and by how much they
72600 differ. In constructing our model we strove for one which exhibited
72700 indistinguishability along some dimensions and distinguishability
72800 along others. That is, we wanted the model to converge on what it was
72900 intended to simulate and to diverge from that which it was not.
73000 Paired-interview transcripts were sent to another 400
73100 randomly selected psychiatrists asking them to rate the responses of
73200 the two `patients' along multiple dimensions. The judges were divided
73300 into groups, each judge being asked to rate responses of each I-O
73400 pair in the interviews along four dimensions. The total number of
73500 dimensions in this test was twelve: linguistic noncomprehension,
73600 thought disorder, organic brain syndrome, bizarreness, anger, fear,
73700 ideas of reference, delusions, mistrust, depression, suspiciousness
73800 and mania. There were three groups of judges, each group being
73900 assigned 4 of the 12 dimensions. These are dimensions which
74000 psychiatrists commonly use in evaluating patients.
74100 (INSERT TABLE 4 HERE)
74200 Table 4 shows there were significant differences, with PARRY
74300 receiving higher scores along the dimensions of linguistic
74400 noncomprehension, thought disorder, bizarreness, anger, mistrust and
74500 suspiciousness. On the delusion dimension the patients were rated
74600 significantly higher. There were no significant differences along
74700 the dimensions of organic brain syndrome, fear, ideas of reference,
74800 depression and mania.
74900 Whereas tests asking the machine-question indicate
75000 indistinguishability at the gross level, a study of the finer
75100 structure of the model's behavior through ratings along scaled
75200 dimensions shows significant differences between patients and model.
75300 These differences are of help to us in suggesting what aspects of the
75400 model should be modified to improve its performance. The graph of
75500 Fig. 2 shows that no modifications are necessary along the dimension
75600 of "organic brain syndrome". But it is clear that PARRY'S
75700 language-comprehension might be improved. Then a future dimensional
75800 test would tell whether improvement had occurred and by how much.
75900 Successive identification of particular areas of failure provides a
76000 type of sensitivity analysis which makes clear what improvements
76100 should be pursued in developing more adequate model versions.
76200 (INSERT FIG. 2 HERE)
76300
76400 6.5 ANALYSIS (5) A RANDOM MODEL
76500 Further evidence that the machine-question is too low a
76600 hurdle and too insensitive a test, comes from the following
76700 experiment. In this test we constructed a random version of the
76800 paranoid model (RANDOM-PARRY) which utilized PARRY'S output
76900 statements, but expressed them randomly independent of what the
77000 interviewer said. Two psychiatrists conducted interviews with this
77100 model, transcripts of which were paired with patient interviews and
77200 sent to 200 randomly selected psychiatrists asking both the
77300 machine-question and the dimension-question. Of the 69 replies to
77400 the machine question, 34 (49%) were right and 35 (51%) wrong. Based
77500 on this random sample of 69 psychiatrists, the 95% confidence
77600 interval ranges from 39 to 63, again indicating a chance level of
77700 agreement. When a poor model, such as a random one, passes a test, it
77800 suggests the test is weak.
77900 (INSERT TABLE 5 HERE)
78000 Although a distinction is not made when the simple machine
78100 question is asked, definite distinctions ARE made when judgements are
78200 requested along specific dimensions. As shown in Table 5,
78300 significant differences appear along the dimensions of linguistic
78400 noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
78500 rated higher. On these particular dimensions we can construct a
78600 continuum in which the random version represents one extreme, the
78700 actual patients another. Nonrandom PARRY lies somewhere between these
78800 two extremes, indicating that it performs significantly better than
78900 the random version but still requires improvement before it can be
79000 considered indistinguishable from patients relative to these
79100 dimensions. Table 6 presents t values for differences between mean
79200 ratings of PARRY and RANDOM-PARRY. (See Table 6 and Fig.2 for the
79300 mean ratings).
79400 (INSERT TABLE 6 AND FIG 2 HERE)
79500 These studies indicate that a more useful way to use
79600 Turing-like tests is to ask expert judges to make ratings along
79700 multiple dimensions that are essential to the model. Thus the model
79800 can serve as an instrument for its own perfection. A good validation
79900 procedure has criteria for better or worse approximations. Useful
80000 tests do not necessarily prove a model, they probe it for its
80100 strengths and weaknesses and clarify what is to be done next in
80200 modifying and repairing the model. Simply asking the machine-question
80300 yields little information relevant to what the model builder most
80400 wants to know, namely, along which dimensions does the model need to
80500 be modified in order to effect an improvement in its performance?
80600
80700 To conclude, it is perhaps historically significant that
80800 these tests were conducted at all. To my knowledge, no one to date
80900 has subjected an interactive simulation model of human symbolic
81000 processes to dimensional indistinguishability tests. These tests set
81100 a precedent and provide a standard against which competing models
81200 might be measured.